by Mercy F. Nyambura Kariuki
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Characters and teams
- Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
- Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
- Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
- Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.
About the company
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments.
One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
Ask Three questions will guide the future marketing program:
Prepare You will use Cyclistic’s historical trip data to analyze and identify trends. Download the previous 12 months of Cyclistic trip data here. (Note: The datasets have a different name because Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate and will enable you to answer the business questions. The data has been made available by Motivate International Inc. under this license.)
This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
Analyze Now that your data is stored appropriately and has been prepared for analysis, start putting it to work.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline
june2020=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2020_06-divvy-tripdata/202006-divvy-tripdata.csv")
july2020=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2020_07-divvy-tripdata/202007-divvy-tripdata.csv")
aug2020=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2020_08-divvy-tripdata/202008-divvy-tripdata.csv")
sept2020=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2020_09-divvy-tripdata/202009-divvy-tripdata.csv")
oct2020=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2020_10-divvy-tripdata/202010-divvy-tripdata.csv")
nov2020=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2020_11-divvy-tripdata/202011-divvy-tripdata.csv")
dec2020=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2020_12-divvy-tripdata/202012-divvy-tripdata.csv")
jan2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_01-divvy-tripdata/202101-divvy-tripdata.csv")
feb2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_02-divvy-tripdata/202102-divvy-tripdata.csv")
march2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_03-divvy-tripdata/202103-divvy-tripdata.csv")
april2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_04-divvy-tripdata/202104-divvy-tripdata.csv")
may2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_05-divvy-tripdata/202105-divvy-tripdata.csv")
june2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_06-divvy-tripdata/202106-divvy-tripdata.csv")
july2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_07-divvy-tripdata/202107-divvy-tripdata.csv")
aug2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_08-divvy-tripdata/202108-divvy-tripdata.csv")
sept2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_09-divvy-tripdata/202109-divvy-tripdata.csv")
oct2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_10-divvy-tripdata/202110-divvy-tripdata.csv")
nov2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_11-divvy-tripdata/202111-divvy-tripdata.csv")
dec2021=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2021_12-divvy-tripdata/202112-divvy-tripdata.csv")
jan2022=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2022_01-divvy-tripdata/202201-divvy-tripdata.csv")
feb2022=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2022_02-divvy-tripdata/202202-divvy-tripdata.csv")
march2022=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2022_03-divvy-tripdata/202203-divvy-tripdata.csv")
april2022=pd.read_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/2022_04-divvy-tripdata/202204-divvy-tripdata.csv")
data=pd.concat([june2020, july2020, aug2020, sept2020, oct2020, nov2020, dec2020, jan2021, feb2021, march2021, april2021, may2021, june2021, july2021, aug2021, sept2021, oct2021, nov2021, dec2021, jan2022, feb2022, march2022, april2022], ignore_index=True)
data.head()
| ride_id | rideable_type | started_at | ended_at | start_station_name | start_station_id | end_station_name | end_station_id | start_lat | start_lng | end_lat | end_lng | member_casual | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 8CD5DE2C2B6C4CFC | docked_bike | 2020-06-13 23:24:48 | 2020-06-13 23:36:55 | Wilton Ave & Belmont Ave | 117 | Damen Ave & Clybourn Ave | 163.0 | 41.940180 | -87.653040 | 41.931931 | -87.677856 | casual |
| 1 | 9A191EB2C751D85D | docked_bike | 2020-06-26 07:26:10 | 2020-06-26 07:31:58 | Federal St & Polk St | 41 | Daley Center Plaza | 81.0 | 41.872077 | -87.629543 | 41.884241 | -87.629634 | member |
| 2 | F37D14B0B5659BCF | docked_bike | 2020-06-23 17:12:41 | 2020-06-23 17:21:14 | Daley Center Plaza | 81 | State St & Harrison St | 5.0 | 41.884241 | -87.629634 | 41.874053 | -87.627716 | member |
| 3 | C41237B506E85FA1 | docked_bike | 2020-06-20 01:09:35 | 2020-06-20 01:28:24 | Broadway & Cornelia Ave | 303 | Broadway & Berwyn Ave | 294.0 | 41.945529 | -87.646439 | 41.978353 | -87.659753 | casual |
| 4 | 4B51B3B0BDA7787C | docked_bike | 2020-06-25 16:59:25 | 2020-06-25 17:08:48 | Sheffield Ave & Webster Ave | 327 | Wilton Ave & Belmont Ave | 117.0 | 41.921540 | -87.653818 | 41.940180 | -87.653040 | casual |
data.start_station_name.nunique()
891
data.end_station_name.nunique()
892
data.rideable_type.nunique()
3
data.rideable_type.unique()
array(['docked_bike', 'electric_bike', 'classic_bike'], dtype=object)
data.start_station_id.nunique()
1525
data.end_station_id.nunique()
1525
data.member_casual.nunique()
2
data.member_casual.unique()
array(['casual', 'member'], dtype=object)
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9299479 entries, 0 to 9299478 Data columns (total 13 columns): # Column Dtype --- ------ ----- 0 ride_id object 1 rideable_type object 2 started_at object 3 ended_at object 4 start_station_name object 5 start_station_id object 6 end_station_name object 7 end_station_id object 8 start_lat float64 9 start_lng float64 10 end_lat float64 11 end_lng float64 12 member_casual object dtypes: float64(4), object(9) memory usage: 922.3+ MB
data.duplicated().sum()
0
data.isnull().sum()
ride_id 0 rideable_type 0 started_at 0 ended_at 0 start_station_name 938438 start_station_id 939061 end_station_name 1014357 end_station_id 1014818 start_lat 0 start_lng 0 end_lat 9351 end_lng 9351 member_casual 0 dtype: int64
Drop unnecessary columns
data=data.drop(['start_station_name', 'end_station_name', 'start_station_id', 'end_station_id'], axis=1)
data.sample(2)
| ride_id | rideable_type | started_at | ended_at | start_lat | start_lng | end_lat | end_lng | member_casual | |
|---|---|---|---|---|---|---|---|---|---|
| 5964724 | 8BF0D0549757C771 | electric_bike | 2021-08-25 16:04:28 | 2021-08-25 16:25:52 | 41.88447 | -87.619454 | 41.890012 | -87.631095 | member |
| 9206726 | 02A235B9DF0CE296 | electric_bike | 2022-04-06 17:57:02 | 2022-04-06 18:01:15 | 42.05000 | -87.670000 | 42.060000 | -87.680000 | casual |
Rename columns
data.rename(columns={'rideable_type' : 'Bike_Type',
'started_at' : 'Start_Time',
'ended_at' : 'End_Time',
'member_casual' : 'Member_Status',
'start_lat' : 'Starting_Latitude',
'start_lng' : 'Starting_Longitude',
'end_lat' : 'Ending_Latitude',
'end_lng' : 'Ending_Longitude'}, inplace=True)
data.sample(3)
| ride_id | Bike_Type | Start_Time | End_Time | Starting_Latitude | Starting_Longitude | Ending_Latitude | Ending_Longitude | Member_Status | |
|---|---|---|---|---|---|---|---|---|---|
| 5169544 | 28B99648FE40403C | classic_bike | 2021-07-30 22:43:51 | 2021-07-30 23:03:41 | 41.915689 | -87.634600 | 41.917805 | -87.682437 | member |
| 5568076 | B04F4D14461E0EBC | classic_bike | 2021-07-17 15:13:53 | 2021-07-17 15:22:44 | 41.892570 | -87.614492 | 41.892278 | -87.612043 | casual |
| 57167 | D05276E16CAF332B | docked_bike | 2020-06-12 06:36:10 | 2020-06-12 06:40:00 | 41.872187 | -87.661501 | 41.872950 | -87.669130 | member |
Add new column to calculate the trip duration
Convert
start_timeandend_timeto datetime dtype first
data["End_Time"] = pd.to_datetime(data["End_Time"])
data["Start_Time"] = pd.to_datetime(data["Start_Time"])
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9299479 entries, 0 to 9299478 Data columns (total 9 columns): # Column Dtype --- ------ ----- 0 ride_id object 1 Bike_Type object 2 Start_Time datetime64[ns] 3 End_Time datetime64[ns] 4 Starting_Latitude float64 5 Starting_Longitude float64 6 Ending_Latitude float64 7 Ending_Longitude float64 8 Member_Status object dtypes: datetime64[ns](2), float64(4), object(3) memory usage: 638.5+ MB
data["Trip_Duration_in_Minutes"] = (data["End_Time"] - data["Start_Time"])/pd.Timedelta(minutes=1)
# data.drop(['Trip_Duration'], axis=1)
Split the Start time to days and months
data["Day"] = data["Start_Time"].dt.day_name()
data["Month"] = data["Start_Time"].dt.month_name()
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9299479 entries, 0 to 9299478 Data columns (total 12 columns): # Column Dtype --- ------ ----- 0 ride_id object 1 Bike_Type object 2 Start_Time datetime64[ns] 3 End_Time datetime64[ns] 4 Starting_Latitude float64 5 Starting_Longitude float64 6 Ending_Latitude float64 7 Ending_Longitude float64 8 Member_Status object 9 Trip_Duration_in_Minutes float64 10 Day object 11 Month object dtypes: datetime64[ns](2), float64(5), object(5) memory usage: 851.4+ MB
data.describe()
| Starting_Latitude | Starting_Longitude | Ending_Latitude | Ending_Longitude | Trip_Duration_in_Minutes | |
|---|---|---|---|---|---|
| count | 9.299479e+06 | 9.299479e+06 | 9.290128e+06 | 9.290128e+06 | 9.299479e+06 |
| mean | 4.190158e+01 | -8.764578e+01 | 4.190184e+01 | -8.764601e+01 | 2.221078e+01 |
| std | 4.575763e-02 | 2.860864e-02 | 4.584665e-02 | 2.840173e-02 | 2.714013e+02 |
| min | 4.164000e+01 | -8.787000e+01 | 4.139000e+01 | -8.897000e+01 | -2.904997e+04 |
| 25% | 4.188160e+01 | -8.766000e+01 | 4.188189e+01 | -8.766000e+01 | 6.866667e+00 |
| 50% | 4.189897e+01 | -8.764173e+01 | 4.189964e+01 | -8.764274e+01 | 1.240000e+01 |
| 75% | 4.192922e+01 | -8.762773e+01 | 4.192955e+01 | -8.762775e+01 | 2.285000e+01 |
| max | 4.563503e+01 | -7.379648e+01 | 4.216812e+01 | -8.744000e+01 | 5.594415e+04 |
data = data.astype({'ride_id':'string', 'Bike_Type':'category', 'Member_Status':'category'})
# Coverting datatypes of each columns.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9299479 entries, 0 to 9299478 Data columns (total 12 columns): # Column Dtype --- ------ ----- 0 ride_id string 1 Bike_Type category 2 Start_Time datetime64[ns] 3 End_Time datetime64[ns] 4 Starting_Latitude float64 5 Starting_Longitude float64 6 Ending_Latitude float64 7 Ending_Longitude float64 8 Member_Status category 9 Trip_Duration_in_Minutes float64 10 Day object 11 Month object dtypes: category(2), datetime64[ns](2), float64(5), object(2), string(1) memory usage: 727.2+ MB
data.shape
(9299479, 12)
data['Trip_Duration_in_Minutes']=data['Trip_Duration_in_Minutes'].astype('int32')
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9299479 entries, 0 to 9299478 Data columns (total 12 columns): # Column Dtype --- ------ ----- 0 ride_id string 1 Bike_Type category 2 Start_Time datetime64[ns] 3 End_Time datetime64[ns] 4 Starting_Latitude float64 5 Starting_Longitude float64 6 Ending_Latitude float64 7 Ending_Longitude float64 8 Member_Status category 9 Trip_Duration_in_Minutes int32 10 Day object 11 Month object dtypes: category(2), datetime64[ns](2), float64(4), int32(1), object(2), string(1) memory usage: 691.8+ MB
# Sorting Values by "Trip_Duration" column in Ascending order.
data.sort_values(by = 'Trip_Duration_in_Minutes')
data.sample(5)
| ride_id | Bike_Type | Start_Time | End_Time | Starting_Latitude | Starting_Longitude | Ending_Latitude | Ending_Longitude | Member_Status | Trip_Duration_in_Minutes | Day | Month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 279470 | 860E31DB4F19B3EB | docked_bike | 2020-06-17 16:54:44 | 2020-06-17 17:14:18 | 41.877641 | -87.649617 | 41.856594 | -87.627542 | member | 19 | Wednesday | June |
| 8179085 | B8EB286855A44F70 | classic_bike | 2021-12-14 09:09:12 | 2021-12-14 09:20:16 | 41.967094 | -87.679028 | 41.943791 | -87.671258 | member | 11 | Tuesday | December |
| 5241951 | 51672E8D857A49C2 | electric_bike | 2021-07-22 19:15:34 | 2021-07-22 19:25:02 | 41.909585 | -87.648065 | 41.937674 | -87.644335 | casual | 9 | Thursday | July |
| 768721 | 48E870EFB7A965A7 | docked_bike | 2020-07-04 18:51:40 | 2020-07-04 18:56:49 | 41.926277 | -87.630834 | 41.926277 | -87.630834 | member | 5 | Saturday | July |
| 4040540 | 25CFBC3FD22EB267 | electric_bike | 2021-05-22 13:13:11 | 2021-05-22 13:43:53 | 41.900000 | -87.680000 | 41.910000 | -87.630000 | casual | 30 | Saturday | May |
Its seen that many rows in some months contained negative values. Such errors happened because the "ending time" is earlier than the "starting time" in their respective rows.
Number of rows containing Negative Values.
data[data['Trip_Duration_in_Minutes'] < 0].count()
ride_id 1300 Bike_Type 1300 Start_Time 1300 End_Time 1300 Starting_Latitude 1300 Starting_Longitude 1300 Ending_Latitude 1294 Ending_Longitude 1294 Member_Status 1300 Trip_Duration_in_Minutes 1300 Day 1300 Month 1300 dtype: int64
Number of rows containing "trip duration" less than "1" minute.
data[data['Trip_Duration_in_Minutes'] < 1].count()
ride_id 153546 Bike_Type 153546 Start_Time 153546 End_Time 153546 Starting_Latitude 153546 Starting_Longitude 153546 Ending_Latitude 153420 Ending_Longitude 153420 Member_Status 153546 Trip_Duration_in_Minutes 153546 Day 153546 Month 153546 dtype: int64
Removing 153546 rows containing negative values & ride length less than 1 minute. Any trips that were below 60 seconds in length are potentially false starts or users trying to re-dock a bike to ensure it was secure.¶
data = data[data['Trip_Duration_in_Minutes'] >= 1]
data = data.reset_index()
data = data.drop(columns=['index'])
data[data['Trip_Duration_in_Minutes'] < 1].count()
ride_id 0 Bike_Type 0 Start_Time 0 End_Time 0 Starting_Latitude 0 Starting_Longitude 0 Ending_Latitude 0 Ending_Longitude 0 Member_Status 0 Trip_Duration_in_Minutes 0 Day 0 Month 0 dtype: int64
data.shape
(9145933, 12)
Boxplot of column "Trip Duration" to see the distribution of data between Member and Casual Rider.
sns.boxplot(data = data,
x ='Member_Status',
y = 'Trip_Duration_in_Minutes',
order = ['member', 'casual']);
Let's check if we have negative records of trip duration
sns.set(color_codes=True)
sns.boxplot(x=data['Trip_Duration_in_Minutes']);
Let's see the number of riders past 24hrs
data_time_high = data[data['Trip_Duration_in_Minutes']>24]
data_time_high.count()
ride_id 2040847 Bike_Type 2040847 Start_Time 2040847 End_Time 2040847 Starting_Latitude 2040847 Starting_Longitude 2040847 Ending_Latitude 2032988 Ending_Longitude 2032988 Member_Status 2040847 Trip_Duration_in_Minutes 2040847 Day 2040847 Month 2040847 dtype: int64
Roughly 22% of rides last over 24 hours.
data.groupby(['Day', 'Member_Status'])['Trip_Duration_in_Minutes'].mean()
Day Member_Status
Friday casual 33.748347
member 13.629047
Monday casual 35.297851
member 13.308575
Saturday casual 38.444986
member 15.545628
Sunday casual 41.508620
member 15.881167
Thursday casual 32.557172
member 13.052681
Tuesday casual 30.958143
member 12.911170
Wednesday casual 31.442476
member 13.099418
Name: Trip_Duration_in_Minutes, dtype: float64
data.isnull().sum()
ride_id 0 Bike_Type 0 Start_Time 0 End_Time 0 Starting_Latitude 0 Starting_Longitude 0 Ending_Latitude 9225 Ending_Longitude 9225 Member_Status 0 Trip_Duration_in_Minutes 0 Day 0 Month 0 dtype: int64
data = data.dropna()
data.isnull().sum()
ride_id 0 Bike_Type 0 Start_Time 0 End_Time 0 Starting_Latitude 0 Starting_Longitude 0 Ending_Latitude 0 Ending_Longitude 0 Member_Status 0 Trip_Duration_in_Minutes 0 Day 0 Month 0 dtype: int64
data.columns
Index(['ride_id', 'Bike_Type', 'Start_Time', 'End_Time', 'Starting_Latitude',
'Starting_Longitude', 'Ending_Latitude', 'Ending_Longitude',
'Member_Status', 'Trip_Duration_in_Minutes', 'Day', 'Month'],
dtype='object')
data.Member_Status.value_counts()
member 5206619 casual 3930089 Name: Member_Status, dtype: int64
fig_1 = data.groupby('Member_Status', as_index=False).count()
px.bar(fig_1, y = 'Member_Status', x = 'ride_id', range_x = [0,6000000],
color = 'Member_Status',
height = 300,
text = 'ride_id',
labels = {'ride_id': 'No. of Rides', 'Member_Status': 'Member/Casual'},
hover_name = 'Member_Status', hover_data = {'Member_Status': False, 'Month': False, 'ride_id': True},
color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})
data['Month'].value_counts()
August 1400770 July 1351622 September 1266152 June 1055097 October 1000267 April 695773 November 608770 May 522794 March 504034 December 372874 January 197146 February 161409 Name: Month, dtype: int64
data['Year'] = data['Start_Time'].dt.year
data.sample()
| ride_id | Bike_Type | Start_Time | End_Time | Starting_Latitude | Starting_Longitude | Ending_Latitude | Ending_Longitude | Member_Status | Trip_Duration_in_Minutes | Day | Month | Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4549989 | 5F181A3B982D5125 | classic_bike | 2021-06-12 12:57:13 | 2021-06-12 13:08:21 | 41.903222 | -87.634324 | 41.910522 | -87.653106 | casual | 11 | Saturday | June | 2021 |
monthly_rides = pd.pivot_table(data,
index = ['Year', 'Month', 'Member_Status'],
values = 'ride_id',
aggfunc = ['count'],
margins = True,
margins_name = 'Total Count')
monthly_rides = monthly_rides.loc[(monthly_rides != 0).any(axis=1)]
monthly_rides
| count | |||
|---|---|---|---|
| ride_id | |||
| Year | Month | Member_Status | |
| 2020 | August | casual | 283818 |
| member | 324706 | ||
| December | casual | 29631 | |
| member | 99757 | ||
| July | casual | 266294 | |
| member | 276295 | ||
| June | casual | 153113 | |
| member | 184996 | ||
| November | casual | 86640 | |
| member | 168247 | ||
| October | casual | 141744 | |
| member | 237716 | ||
| September | casual | 226517 | |
| member | 295041 | ||
| 2021 | April | casual | 134760 |
| member | 197397 | ||
| August | casual | 406961 | |
| member | 385285 | ||
| December | casual | 68629 | |
| member | 174857 | ||
| February | casual | 9906 | |
| member | 38529 | ||
| January | casual | 17820 | |
| member | 77493 | ||
| July | casual | 435345 | |
| member | 373688 | ||
| June | casual | 364502 | |
| member | 352486 | ||
| March | casual | 83038 | |
| member | 142318 | ||
| May | casual | 253008 | |
| member | 269786 | ||
| November | casual | 105230 | |
| member | 248653 | ||
| October | casual | 253415 | |
| member | 367392 | ||
| September | casual | 358806 | |
| member | 385788 | ||
| 2022 | April | casual | 123784 |
| member | 239832 | ||
| February | casual | 20926 | |
| member | 92048 | ||
| January | casual | 18097 | |
| member | 83736 | ||
| March | casual | 88105 | |
| member | 190573 | ||
| Total Count | 9136708 |
fig_2 = data.groupby(['Year', 'Month', 'Member_Status'], as_index=False).count()
fig_2 = fig_2[fig_2['ride_id'] != 0]
px.line(fig_2, x = 'Month', y = 'ride_id', range_y = [0,600000],
color = 'Member_Status',
line_shape = 'spline',
markers=True,
labels = {'ride_id': 'No. of Rides', 'Month': 'Months (Oct 2020 - Apr 2022)', 'Member_Status': 'Member/Casual'},
hover_name = 'Member_Status', hover_data = {'Member_Status': False, 'Month': True, 'ride_id': True},
color_discrete_map = {'Casual': '#FF934F', 'Member': '#058ED9'})
pd.pivot_table(data,
index = ['Day', 'Member_Status'],
values = 'ride_id',
aggfunc = ['count'],
margins = True,
margins_name = 'Total Count')
| count | ||
|---|---|---|
| ride_id | ||
| Day | Member_Status | |
| Friday | casual | 565702 |
| member | 754748 | |
| Monday | casual | 436190 |
| member | 707221 | |
| Saturday | casual | 876976 |
| member | 742152 | |
| Sunday | casual | 734030 |
| member | 641194 | |
| Thursday | casual | 459930 |
| member | 778935 | |
| Tuesday | casual | 419783 |
| member | 781366 | |
| Wednesday | casual | 437478 |
| member | 801003 | |
| Total Count | 9136708 |
Saturday is by far the day with the most rides for casual riders (876976), followed by Saturday The day with the most rides for member riders is Wednesday (8010030) followed by Tuesday (781366).
fig_3 = data.groupby(['Day', 'Member_Status'], as_index=False).count()
px.line(fig_3, x = 'Day', y = 'ride_id', range_y = [0,1200000],
color = 'Member_Status',
line_shape = 'spline',
markers=True,
labels = {'ride_id': 'No. of Rides', 'Day': 'Weekdays', 'Member_Status': 'Member/Casual'},
hover_name = 'Member_Status', hover_data = {'Member_Status': False, 'Month': False, 'ride_id': True},
color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})
fig_6 = round(data.groupby(['Day', 'Member_Status'], as_index=False).mean(),2)
px.bar(fig_6, x = 'Day', y = 'Trip_Duration_in_Minutes',
color = 'Member_Status',
barmode='group',
text = 'Trip_Duration_in_Minutes',
labels = {'Trip_Duration_in_Minutes': 'Average Ride Length (minutes)', 'Member_Status': 'Member/Casual', 'Day': 'Weekdays'},
hover_name = 'Member_Status', hover_data = {'Member_Status': False, 'Trip_Duration_in_Minutes': True},
color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})
Average Trip Duration 2020-2022
pd.pivot_table(data,
index = 'Member_Status',
values = ['Trip_Duration_in_Minutes'],
aggfunc = ['mean'],
margins = True,
margins_name = 'Total Ride Average')
| mean | |
|---|---|
| Trip_Duration_in_Minutes | |
| Member_Status | |
| casual | 33.977449 |
| member | 13.608014 |
| Total Ride Average | 22.369779 |
fig_4 = round(data.groupby('Member_Status', as_index=False).mean(),2)
px.bar(fig_4, y = 'Member_Status', x = 'Trip_Duration_in_Minutes', range_x = [0,35],
color = 'Member_Status',
height = 300,
text = 'Trip_Duration_in_Minutes',
labels = {'Trip_Duration_in_Minutes': 'Average Ride Length (minutes)', 'Member_Status': 'Member/Casual'},
hover_name = 'Member_Status', hover_data = {'Member_Status': False, 'Trip_Duration_in_Minutes': True},
color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})
- The average ride length of casual riders are more than twice of members.
Monthly Average Trip Duration 2020-2022
pd.pivot_table(data,
index = ['Year', 'Month', 'Member_Status'],
values = ['Trip_Duration_in_Minutes'],
aggfunc = ['mean'],
margins = True,
margins_name = 'Total Trip Duration')
| mean | |||
|---|---|---|---|
| Trip_Duration_in_Minutes | |||
| Year | Month | Member_Status | |
| 2020 | August | casual | 44.184449 |
| member | 16.414344 | ||
| December | casual | 25.595997 | |
| member | 12.158796 | ||
| July | casual | 59.217222 | |
| member | 17.354874 | ||
| June | casual | 51.145984 | |
| member | 18.317720 | ||
| November | casual | 31.294402 | |
| member | 13.129803 | ||
| October | casual | 29.784294 | |
| member | 13.626066 | ||
| September | casual | 37.681269 | |
| member | 15.108571 | ||
| 2021 | April | casual | 36.562110 |
| member | 14.187880 | ||
| August | casual | 27.268576 | |
| member | 13.558849 | ||
| December | casual | 21.093168 | |
| member | 10.520568 | ||
| February | casual | 44.695437 | |
| member | 15.309741 | ||
| January | casual | 24.347194 | |
| member | 12.181436 | ||
| July | casual | 31.380498 | |
| member | 13.751986 | ||
| June | casual | 35.702048 | |
| member | 14.092180 | ||
| March | casual | 36.778535 | |
| member | 13.498531 | ||
| May | casual | 37.001806 | |
| member | 14.186233 | ||
| November | casual | 19.881925 | |
| member | 10.783188 | ||
| October | casual | 24.002478 | |
| member | 11.977659 | ||
| September | casual | 26.517895 | |
| member | 13.205019 | ||
| 2022 | April | casual | 23.297292 |
| member | 11.101971 | ||
| February | casual | 22.050416 | |
| member | 10.814564 | ||
| January | casual | 23.720230 | |
| member | 11.319504 | ||
| March | casual | 25.691493 | |
| member | 11.463702 | ||
| Total Trip Duration | 22.369779 |
fig_5 = round(data.groupby(['Year', 'Month', 'Member_Status'], as_index=False).mean(),2).dropna()
px.bar(fig_5, x = 'Month', y = 'Trip_Duration_in_Minutes',
color = 'Member_Status',
barmode='group',
text = 'Trip_Duration_in_Minutes',
labels = {'Trip_Duration_in_Minutes': 'Average Ride Length (minutes)', 'Member_Status': 'Member/Casual', 'Month': 'Months (June 2020 - April 2022)'},
hover_name = 'Member_Status', hover_data = {'Member_Status': False, 'Trip_Duration_in_Minutes': True},
color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})
Analyzing Difference in Bike Type Usage Between Casual riders and Members
pd.pivot_table(data,
index = ['Bike_Type', 'Member_Status'],
values = ['ride_id'],
aggfunc = ['count'],
margins = True,
margins_name = 'Total Rides')
| count | ||
|---|---|---|
| ride_id | ||
| Bike_Type | Member_Status | |
| classic_bike | casual | 1358016 |
| member | 2325746 | |
| docked_bike | casual | 1306325 |
| member | 1238142 | |
| electric_bike | casual | 1265748 |
| member | 1642731 | |
| Total Rides | 9136708 |
fig_7 = data.groupby(['Bike_Type', 'Member_Status'], as_index=False).count()
px.bar(fig_7, x = 'Bike_Type', y = 'ride_id',
color = 'Member_Status',
barmode='group',
text = 'ride_id',
labels = {'ride_id': 'No. of Rides', 'Member_Status': 'Member/Casual', 'Bike_Type' : 'Bike Type'},
hover_name = 'Member_Status', hover_data = {'Member_Status': False, 'Trip_Duration_in_Minutes': False},
color_discrete_map = {'casual': '#FF934F', 'member': '#058ED9'})
The average ride length of causual riders are more than twice as of members.
From the average ride length difference, we can conclude that Annual members usually use bike share for daily commuting, while casual riders mostly use bike share for leisure rides mostly during Weekends.
But there are a fixed number of casual riders who use bike share for commuting.
A new Annual Membership package for Weekend usage only will attract current Weekend casual riders.
Promotions aiming at current Weekday casual riders must be implemented as soon as possible. Those promtions must include the financial savings of taking membership when compared to single passes and full day passes for a year long period.
A Loyalty Program for casual riders can be implemented, where occasional membership fees discounts must be given to casual riders with high loyalty points.
data.to_csv("C:/Users/kariu/OneDrive/Desktop/Data Science/Google_Capstone_Project/Cyclistic Trip Data/cyclistic_data_merged.csv", index=False)